4,217 research outputs found
Just Another Gibbs Additive Modeller: Interfacing JAGS and mgcv
The BUGS language offers a very flexible way of specifying complex
statistical models for the purposes of Gibbs sampling, while its JAGS variant
offers very convenient R integration via the rjags package. However, including
smoothers in JAGS models can involve some quite tedious coding, especially for
multivariate or adaptive smoothers. Further, if an additive smooth structure is
required then some care is needed, in order to centre smooths appropriately,
and to find appropriate starting values. R package mgcv implements a wide range
of smoothers, all in a manner appropriate for inclusion in JAGS code, and
automates centring and other smooth setup tasks. The purpose of this note is to
describe an interface between mgcv and JAGS, based around an R function,
`jagam', which takes a generalized additive model (GAM) as specified in mgcv
and automatically generates the JAGS model code and data required for inference
about the model via Gibbs sampling. Although the auto-generated JAGS code can
be run as is, the expectation is that the user would wish to modify it in order
to add complex stochastic model components readily specified in JAGS. A simple
interface is also provided for visualisation and further inference about the
estimated smooth components using standard mgcv functionality. The methods
described here will be un-necessarily inefficient if all that is required is
fully Bayesian inference about a standard GAM, rather than the full flexibility
of JAGS. In that case the BayesX package would be more efficient.Comment: Submitted to the Journal of Statistical Softwar
Inferring UK COVID-19 fatal infection trajectories from daily mortality data: were infections already in decline before the UK lockdowns?
The number of new infections per day is a key quantity for effective epidemic
management. It can be estimated relatively directly by testing of random
population samples. Without such direct epidemiological measurement, other
approaches are required to infer whether the number of new cases is likely to
be increasing or decreasing: for example, estimating the pathogen effective
reproduction number, R, using data gathered from the clinical response to the
disease. For Covid-19 (SARS-CoV-2) such R estimation is heavily dependent on
modelling assumptions, because the available clinical case data are
opportunistic observational data subject to severe temporal confounding. Given
this difficulty it is useful to retrospectively reconstruct the time course of
infections from the least compromised available data, using minimal prior
assumptions. A Bayesian inverse problem approach applied to UK data on first
wave Covid-19 deaths and the disease duration distribution suggests that fatal
infections were in decline before full UK lockdown (24 March 2020), and that
fatal infections in Sweden started to decline only a day or two later. An
analysis of UK data using the model of Flaxman et al. (2020, Nature 584) gives
the same result under relaxation of its prior assumptions on R, suggesting an
enhanced role for non pharmaceutical interventions (NPIs) short of full lock
down in the UK context. Similar patterns appear to have occurred in the
subsequent two lockdowns.Comment: Updated using data up to February 2021. Peer reviewed version in
press in Biometric
An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods
The challenges posed by complex stochastic models used in computational
ecology, biology and genetics have stimulated the development of approximate
approaches to statistical inference. Here we focus on Synthetic Likelihood
(SL), a procedure that reduces the observed and simulated data to a set of
summary statistics, and quantifies the discrepancy between them through a
synthetic likelihood function. SL requires little tuning, but it relies on the
approximate normality of the summary statistics. We relax this assumption by
proposing a novel, more flexible, density estimator: the Extended Empirical
Saddlepoint approximation. In addition to proving the consistency of SL, under
either the new or the Gaussian density estimator, we illustrate the method
using two examples. One of these is a complex individual-based forest model for
which SL offers one of the few practical possibilities for statistical
inference. The examples show that the new density estimator is able to capture
large departures from normality, while being scalable to high dimensions, and
this in turn leads to more accurate parameter estimates, relative to the
Gaussian alternative. The new density estimator is implemented by the esaddle R
package, which can be found on the Comprehensive R Archive Network (CRAN)
Scalable visualisation methods for modern Generalized Additive Models
In the last two decades the growth of computational resources has made it
possible to handle Generalized Additive Models (GAMs) that formerly were too
costly for serious applications. However, the growth in model complexity has
not been matched by improved visualisations for model development and results
presentation. Motivated by an industrial application in electricity load
forecasting, we identify the areas where the lack of modern visualisation tools
for GAMs is particularly severe, and we address the shortcomings of existing
methods by proposing a set of visual tools that a) are fast enough for
interactive use, b) exploit the additive structure of GAMs, c) scale to large
data sets and d) can be used in conjunction with a wide range of response
distributions. All the new visual methods proposed in this work are implemented
by the mgcViz R package, which can be found on the Comprehensive R Archive
Network
Shape constrained additive models
A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health
COVID-19 and the difficulty of inferring epidemiological parameters from clinical data
Knowing the infection fatality ratio (IFR) is of crucial importance for
evidence-based epidemic management: for immediate planning; for balancing the
life years saved against the life years lost due to the consequences of
management; and for evaluating the ethical issues associated with the tacit
willingness to pay substantially more for life years lost to the epidemic, than
for those to other diseases. Against this background Verity et al. (2020,
Lancet Infections Diseases) have rapidly assembled case data and used
statistical modelling to infer the IFR for COVID-19. We have attempted an
in-depth statistical review of their approach, to identify to what extent the
data are sufficiently informative about the IFR to play a greater role than the
modelling assumptions, and have tried to identify those assumptions that appear
to play a key role. Given the difficulties with other data sources, we provide
a crude alternative analysis based on the Diamond Princess Cruise ship data and
case data from China, and argue that, given the data problems, modelling of
clinical data to obtain the IFR can only be a stop-gap measure. What is needed
is near direct measurement of epidemic size by PCR and/or antibody testing of
random samples of the at risk population.Comment: Version accepted by the Lancet Infectious Diseases. See previous
version for less terse presentatio
- …